Towards Distributional Semantics-based Classification of Collocations for Collocation Dictionaries

نویسنده

  • Leo Wanner
چکیده

Automatic acquisition of raw source material is of great aid for the compilation of dictionaries, and, in particular, of specialized dictionaries such as collocation dictionaries. The extraction of collocations from corpora has been actively worked on since the late eighties. The quality of the state-of-the-art extraction algorithms allows the lexicographers to obtain lists of collocations they can work with. However, mere lists of collocations are not sufficient. In collocation dictionaries, collocations are grouped semantically, which also presupposes a semantic classification of collocations. In this article, a distributional semantics-based model is proposed that classifies collocations with respect to broad semantic categories as encountered in dictionaries. In experiments with Spanish verb-noun and noun-adjective collocations from the lexicographic field of emotion nouns, it is shown that the use of features extracted from the context of collocations is decisive for retrieval of draft entries for collocation dictionaries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measuring the Compositionality of Collocations via Word Co-occurrence Vectors: Shared Task System Description

A description of a system for measuring the compositionality of collocations within the framework of the shared task of the Distributional Semantics and Compositionality workshop (DISCo 2011) is presented. The system exploits the intuition that a highly compositional collocation would tend to have a considerable semantic overlap with its constituents (headword and modifier) whereas a collocatio...

متن کامل

Towards a corpus-based dictionary of German noun-verb collocations

We 1 describe our attempts to automatically extract raw material for a dictionary of German noun-verb collocations from large corpora of newspaper text. Such a dictionary should be about collocations and it should include a description of their linguistic properties, rather than listing the mere lexical cooccurrence. Since most statistical collocation nding tools do not provide other than lexic...

متن کامل

Using chunked corpora for the acquisition of collocations and idiomatic expressions

This paper1 discusses the use of recursive chunking of large German corpora (over 300 million words) for the identification and partial classification of significant lexical cooccurrences of adjectives and verbs. The goal is to provide a fine-grained syntactic classification of the data at the levels of subcategorization and scrambling. We analyze the combinatory preferences of adjectives with ...

متن کامل

Japanese Learners’dictionary of I-adjective-noun Collocations

This paper demonstrates a method for creating Japanese learners dictionary of i-adjective-noun collocations. After an introduction of the importance of collocations and the necessity of their inclusion in Japanese language learning, we present various corpora types and corpus query tools that are used to obtain variety of collocational usage in different types of discourse. The Japanese languag...

متن کامل

Modeling the Non-Substitutability of Multiword Expressions with Distributional Semantics and a Log-Linear Model

Non-substitutability is a property of Multiword Expressions (MWEs) that often causes lexical rigidity and is relevant for most types of MWEs. Efficient identification of this property can result in the efficient identification of MWEs. In this work we propose using distributional semantics, in the form of word embeddings, to identify candidate substitutions for a candidate MWE and model its sub...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016